# English Visual Question Answering
Jedi 7B 1080p
Apache-2.0
Qwen2.5-VL-7B-Instruct is a multimodal model based on the Qwen2.5 architecture, supporting joint processing of images and text, suitable for vision-language tasks.
Image-to-Text
Safetensors English
J
xlangai
239
2
Gemma 3 27b It Abliterated Mlx Vlm 4Bit
This model is a multimodal model in MLX format converted from huihui-ai/gemma-3-27b-it-abliterated, supporting the processing from image and text to text.
Image-to-Text
Transformers English

G
aimeri
264
0
Open Qwen2VL
CC
Open-Qwen2VL is a multimodal model capable of receiving both images and text as input and generating text output.
Image-to-Text English
O
weizhiwang
568
15
Smolvlm2 256M Video Instruct Mlx
Apache-2.0
This is a video-text-to-text model converted based on the MLX framework, suitable for video understanding and instruction-following tasks.
Image-to-Text
Transformers English

S
mlx-community
591
7
Qwen2 VL 7B Instruct GGUF
Apache-2.0
Qwen2-VL-7B-Instruct is a 7B-parameter multimodal model supporting image-text interaction tasks.
Image-to-Text English
Q
gaianet
102
2
Pix2struct Vizwizvqa Base
Apache-2.0
This is a visual question answering model based on the Apache-2.0 license, supporting the English language, and focusing on handling vision-related question answering tasks.
Text-to-Image
Transformers English

P
nanom
16
0
Featured Recommended AI Models